A machine learning pipeline to improve De Bruijn graph metatranscriptomic assemblies

نویسنده

  • Hussein Mohsen
چکیده

Motivation: With the growing significance of metatranscriptomic assemblies, the need to improve their quality and maintain their controllable size has become essential. That would help in boosting all applications based on metatranscriptomic assembly. In this paper, we propose a pipeline that filters de novo assemblies while preserving or improving their quality. Original assemblies are based on De Bruijn graphs and were created by Oases. Auxiliary scripts that help reporting statistics about all kinds of metatranscriptomic assemblies are integrated with the pipeline as well. Results: Experimental results show that the pipeline helped improving the accuracy of the assemblies with up to 6+% in addition to filtering 5000+ transcripts from 6 original assemblies each made up of 21000+ transcripts. The high precision of filtered assemblies and the reasonable running time of the pipeline makes it a potential postprocessing step of different de novo assemblies. Availability: All pipeline scripts are publicly available at https://sourceforge.net/projects/metatranspipeline/files/ Contact: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing de Bruijn graph of metagenome assembly for metatranscriptome analysis

MOTIVATION Metagenomics research has accelerated the studies of microbial organisms, providing insights into the composition and potential functionality of various microbial communities. Metatranscriptomics (studies of the transcripts from a mixture of microbial species) and other meta-omics approaches hold even greater promise for providing additional insights into functional and regulatory ch...

متن کامل

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes ...

متن کامل

HINGE: long-read assembly achieves optimal repeat resolution.

Long-read sequencing technologies have the potential to produce gold-standard de novo genome assemblies, but fully exploiting error-prone reads to resolve repeats remains a challenge. Aggressive approaches to repeat resolution often produce misassemblies, and conservative approaches lead to unnecessary fragmentation. We present HINGE, an assembler that seeks to achieve optimal repeat resolution...

متن کامل

MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning

The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scor...

متن کامل

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

MOTIVATION De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015